SemanticScuttle - klotz.me » klotz: machine learning+data science

klotz: machine learning* + data science*

Retrieval Augmented Generation in SQLite

The article explores the concept of Retrieval-Augmented Generation (RAG) using SQLite, specifically with the sqlite-vec extension and the OpenAI API. It outlines a simplified approach to RAG, moving away from complex frameworks and cloud vector databases, using SQLite's virtual tables for vector search and semantic understanding.

2025-02-20 Tags: rag, llm sqlite, sqlite-vec, vector search, machine learning, data science by klotz

The Big Book of Large Language Models

A comprehensive guide to Large Language Models by Damien Benveniste, covering various aspects from transformer architectures to deploying LLMs.

Language Models Before Transformers
Attention Is All You Need: The Original Transformer Architecture
A More Modern Approach To The Transformer Architecture
Multi-modal Large Language Models
Transformers Beyond Language Models
Non-Transformer Language Models
How LLMs Generate Text
From Words To Tokens
Training LLMs to Follow Instructions
Scaling Model Training
Fine-Tuning LLMs
Deploying LLMs

2025-02-11 Tags: llm, damien benveniste, machine learning, data science, book by klotz

The Data Scientist’s Dilemma: Answering 'What If?' Questions Without Experiments

The article discusses methods for data scientists to answer 'what if' questions regarding the impact of actions or events without having conducted prior experiments. It focuses on creating counterfactual predictions using machine learning techniques and compares a proposed method with Google's Causal Impact. The approach involves using historical data and control groups to estimate the effect of modifications, addressing challenges such as seasonality, confounders, and temporal drift.

2025-01-11 Tags: data science, causal inference, counterfactual prediction, machine learning, causal impact, time series, forecasting by klotz

An Overview of Feature Selection

This article provides an overview of feature selection in machine learning, detailing methods to maximize model accuracy, minimize computational costs, and introduce a novel method called History-based Feature Selection (HBFS).

2025-01-08 Tags: feature selection, machine learning, hbfs, optimization, data science by klotz

Explaining Machine Learning Models: A Non-Technical Guide to Interpreting SHAP Analyses

This article provides a non-technical guide to interpreting SHAP analyses, useful for explaining machine learning models to non-technical stakeholders, with a focus on both local and global interpretability using various visualization methods.

2024-11-25 Tags: shap, machine learning, interpretability, data science, xai by klotz

OpenAI Embeddings and Clustering for Survey Analysis — A How-To Guide

A guide on how to use OpenAI embeddings and clustering techniques to analyze survey data and extract meaningful topics and actionable insights from the responses.

The process involves transforming textual survey responses into embeddings, grouping similar responses through clustering, and then identifying key themes or topics to aid in business improvement.

2024-10-26 Tags: embedding, clustering, survey analysis, data science, visualization, k-means, tsne by klotz

Using PCA for Outlier Detection

PCA (principal component analysis) can be effectively used for outlier detection by transforming data into a space where outliers are more easily identifiable due to the reduction in dimensionality and reshaping of data patterns.

2024-10-24 Tags: pca, outlier detection, dimensionality reduction, data science, machine learning by klotz

Time Series — From Analyzing the Past to Predicting the Future

A deep dive into time series analysis and forecasting methods, providing foundational knowledge and exploring various techniques used for understanding past data and predicting future outcomes.

2024-10-24 Tags: time series, forecasting, data science, production engineering, machine learning by klotz

Autoencoders: An Ultimate Guide for Data Scientists

A detailed overview of the architecture, Python implementation, and future of autoencoders, focusing on their use in feature extraction and dimension reduction in unsupervised learning.

2024-10-18 Tags: autoencoders, machine learning, data science, neural networks, feature extraction, dimension reduction, unsupervised learning, encoder, decoder by klotz

Support Vector Classifier, Explained: A Visual Guide with Mini 2D Dataset

Support Vector Machine (SVM) algorithm with a focus on classification tasks, using a simple 2D dataset for illustration. It explains key concepts like hard and soft margins, support vectors, kernel tricks, and optimization probles.

2024-10-01 Tags: support vector machine, classification, machine learning, data science, kernel trick, optimization by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

klotz: machine learning* + data science*

Linked Tags

Related Tags